100 ◾ Bioinformatics
assembly. An example of programs which use the statistical approach is QUAST. The sec-
ond approach depends on evolutionary relatedness to similar organisms to estimate the
number of genes in the genomes and gene completeness. An example of the programs that
use this approach is BUSCO.
3.3.1 Statistical Assessment for Genome Assembly
The statistical assessment of a genome assembly can be performed in two ways: reference-
guided approach and non-reference approach. One reason for using the de novo genome
assembly is the unavailability of a reference genome for the species studied. However, we
know that the de novo assembly can also be used in some applications to assemble a genome
of an organism that has a reference genome. When an organism has no reference genome,
the de novo genome assembly can be assessed without a reference genome. Some assem-
blers like ABySS can generate some useful statistics for assessing the assembled genome.
Table 3.1 includes the statistical metrics for a genome assembly. Not all assemblers have
modules generating that assessment metrics. We can use a program like QUAST (QUality
ASsessment Tool) [11] for this purpose. The current version of QUAST can be downloaded,
decompressed, and installed as follows:
wget https://downloads.sourceforge.net/project/quast/quast-
5.0.2.tar.gz
tar -xzf quast-5.0.2.tar.gz
cd quast-5.0.2
sudo ./setup.py install_full
This will install QUAST and set the path so that you will be able to run it from any direc-
tory. Notice that the above file path or name may change in a future version. QUAST can
be used to assess genome assemblies (contigs or scaffolds) generated by de novo assemblers.
It performs reference-guided assessment, which requires a reference genome sequence for
the species studied, or non-reference assessment, in which no reference sequence is used.
This is usually the case when the organism is unknown. In the following, we will assess the
de novo E. coli genomes assembled with ABySS and SPAdes above without using a refer-
ence sequence. You can copy the FASTA scaffold files created by these two programs into a
separate directory (e.g., “qc”) with new names using Linux command line as follows:
mkdir qc
cp abyss_ecoli_ass/ecoli-scaffolds.fa qc/abyss_ecoli_ass.fasta
cp spades_ecoli_ass/scaffolds.fasta qc/spades_ecoli_ass.fasta
cp hyb_spades_ecoli_ass/scaffolds.fasta qc/spades_hyb_ecoli_ass.
fasta
With the above commands, we created a directory “qc” and we copied the scaffolds’ files
created by ABySS and SPAdes into it.
The next step, change into the new directory “cd qc” and run the following QUAST
command to perform quality assessment for the three genome assemblies.